Detecting CD8 Tpex with ProjecTILs

In this tutorial, we will present how to use ProjecTILs to detect Tpex in public datasets that were not detecting them. This CD8 human TILs reference was built using Nick Borcherding collection of single-cell datasets from tumor patients. The map consists of 11,021 high-quality single-cell transcriptomes from 20 samples, covering 7 tumor types.

The process and code to build the map can be found in this Github repo.

Briefly, this reference was built using curated CD8+ T cells from 20 tumor-infiltrating patients, and integrated using semi-supervised STACAS. Unsupervised clustering was performed on the integrated samples. Clusters were manually annotated with respect to classical immunology markers, with a special interest in detecting Tpex. Clusters were later downsampled to have at most 2000 cells per cluster. In our experience, this optimized cluster balance improves projection, as reported here and here. Finally, this map was converted as a ProjecTILs reference. Please note that, as this reference was constructed with tumor-infiltrating samples, it might not work perfectly when mapping other tissues, such as blood or draining lymph nodes (DLN).

Why doing projection?

Projection classifies cells using a well curated ProjecTILs reference map.

  • This method has the benefit of using the same cell types across projects, which is especially beneficial when analyzing huge collections of datasets.

  • This reference is also a way to annotate datasets only by stable cell types, and not transient cell states, like cell cycle or activation.

  • Results will be comparable across conditions, as there is little manual tuning. For instance, there is no need for steps that can have a high impact, such as the selection of highly variable genes.

  • Projection is robust to batch effects, like single-cell technologies or sequencing depth (for more information, please read ProjecTILs paper - Andreatta et al. 2021).

What happens if some cells are not covered by the reference

If some cells are not covered by the reference, they automatically get filtered-out (eg. CD4 T cells if the reference is CD8 T cells).

By default, both Run.ProjecTILs() and ProjecTILs.classifier() have the parameter filter.cells set as TRUE. This means that cells out of reference will be filtered-out using the built-in scGate model. This model is stored in the slot misc of the reference Seurat object: ref@misc$scGate. You can custom this filtering by amending this slot using scGate grammar.

Human CD8 TIL reference

First, let’s have a look at the reference map.

# Load the reference
options(timeout = max(900, getOption("timeout")))
#download.file("https://figshare.com/ndownloader/files/38921366", destfile = "CD8T_human_ref_v1.rds")
ref.cd8 <- load.reference.map("CD8T_human_ref_v1.rds")
## [1] "Loading Custom Reference Atlas..."
## [1] "Loaded Custom Reference map Human CD8 TILs"
# Setup colors
mycols <- ref.cd8@misc$atlas.palette

# DimPlot
DimPlot(ref.cd8,  group.by = 'functional.cluster', label = T, repel = T, cols = mycols) + theme(aspect.ratio = 1)

Here are the different T cell subsets defined in the map:

  • CD8.NaiveLike: Antigen-naive T cells

  • CD8.CM: Central Memory T cells

  • CD8.EM: Effector Memory

  • CD8.TEMRA: Effector Memory cells re-expressing CD45RA. Sometimes called Short Lived Effectors (SLEC), or Cytotoxic effectors

  • CD8.TPEX: Progenitors exhausted T cells

  • CD8.TEX: Exhausted T cells

  • CD8.MAIT: MAIT cells, innate-like T cells defined by their semi-invariant αβ T cell receptor (TCR)

Let’s check Differentially Expressed Genes (DEGs) between each cluster, to confirm clusters marker genes.

# Compute DEGs
DefaultAssay(ref.cd8) <- "RNA"
ref.cd8 <- NormalizeData(ref.cd8)
markers <- FindAllMarkers(object = ref.cd8, only.pos = TRUE, assay = "RNA")

# Remove TCR genes
tcr.genes <- SignatuR::GetSignature(SignatuR$Hs$Compartments$TCR)
markers <- markers %>% filter(!gene %in% unname(tcr.genes))
markers %>% group_by(cluster) %>% top_n(n = 3, wt = avg_log2FC) -> top3

# Plot heatmap
VlnPlot(ref.cd8, assay = "RNA", features  = top3$gene, cols = mycols, stack = T, flip = T, fill.by = "ident") + NoLegend()

Progenitors exhauted (Tpex)

Pioneering work in the murine lymphocytic choriomeningitis virus (LCMV) model has mapped the molecular and phenotypic profiles of CD8+ T cells, revealing progenitors of exhausted T cells (TPEX), defined by the expression of transcription factors, TOX and TCF1. These cells arise in the acute-phase of infection and sustain terminally exhausted subsets over the long-term. As they are thought to renew the pool of terminally exhausted cells, an increasing number of reports show that this population is of primary importance for cancer immunotherapy. (Utzschneideret al., Siddiqui et al.).

Here are some marker genes to help identify them:

Positive markers: TCF7, CD200, CRTAM, GNG4, TOX, LEF1, CCR7, CXCL13, XCL1, XCL2

Negative markers: GZMB, NKG7, PRF1, HAVCR2, CCL5, GZMA

Tumor T cell differentiation model, going through an intermediate state of progenitor exhausted (Tpex). Figure from Andreatta et al. 2021.

Let’s display 6 positive and 6 negative TPEX markers, that are especially useful to distinguish TPEX from closely related TEX.

DefaultAssay(ref.cd8) <- 'RNA'
FeaturePlot(ref.cd8, features = c('TCF7','XCL1','XCL2',"CXCL13","TOX","CRTAM","GZMB","NKG7","CCL5","HAVCR2","PRF1","GZMA"), ncol = 3, pt.size = 0.2, order = T, cols = pals::coolwarm()) & NoLegend()

Tpex importance is of growing interest, but except a few studies (Oliveira et al., Magen et al., Zheng et al.), this subset has been harder to detect in human. Having a human CD8 reference with clearly annotated Tpex solves this issue.

Detecting Tpex in Gueguen et al. 2021

Setup data

#download.file("https://figshare.com/ndownloader/files/39082049", destfile = "gueguen.cd3.Rds")
gueguen.cd3 <- readRDS("gueguen.cd3.Rds")
gueguen.cd3$seurat_clusters <- Idents(gueguen.cd3)

Projection

Thanks to automatic scGate filtering, only the CD8 clusters (upper part of the UMAP) were mapped.

# Projection
DefaultAssay(gueguen.cd3) <- "RNA"
gueguen.cd3 <- ProjecTILs.classifier(gueguen.cd3, ref = ref.cd8, filter.cells = T, split.by = 'orig.ident', ncores = 6)
table(gueguen.cd3$functional.cluster)
## 
##        CD8.CM        CD8.EM      CD8.MAIT CD8.NaiveLike     CD8.TEMRA 
##          2132          3181           147           238           276 
##       CD8.TEX      CD8.TPEX 
##          3340           337
DimPlot(gueguen.cd3, order = T,  label = T, repel = T) 

DimPlot(gueguen.cd3, group.by = 'functional.cluster', order = T, cols = mycols, label = T, repel = T)

We can check specific TPEX genes to confirm that identities match.

DefaultAssay(gueguen.cd3) <- 'RNA'
FeaturePlot(gueguen.cd3, features = c('XCL1','XCL2'), ncol = 2, pt.size = 0.5, order = T, cols = pals::coolwarm()) & NoLegend()

# Radar plots
p <- plot.states.radar(ref.cd8, query = gueguen.cd3, min.cells = 10, genes4radar = c('LEF1', "TCF7", "CCR7", "GZMK", "FGFBP2",'FCGR3A','ZNF683','ITGAE', "CRTAM", "CD200",'GNG4', "HAVCR2", "TOX", "ENTPD1", 'TYROBP','KIR2DL1'), return = T) 
wrap_plots(p) + theme_bw()

We can see that the previously homogeneous cluster CD8-LAYN is actually composed of two subsets: CD8.TEX and CD8.TPEX.

How to assess quality/robustness of mapping

It can be hard to make the call between cells modified, and cells plainly wrongly mapped. We usually recommend assessing mapping consistency by checking consistency among top markers. If the query seems quite different from the reference, we recommend understanding DEGs between the reference and the query, for each cell type of interest.

Detecting Tpex in Yost et al. 2019

Setup data

# Load data
#download.file("https://figshare.com/ndownloader/files/39109277", destfile = "Yost.cd3.Rds")
Yost.cd3 <- readRDS("Yost.cd3.Rds")

# Normalize data
Yost.cd3 <- NormalizeData(Yost.cd3)
Yost.cd3 <- ScaleData(Yost.cd3)
## Centering and scaling data matrix
# DimPlots
DimPlot(Yost.cd3, reduction = 'umap', group.by = 'cluster', label = T)

DimPlot(Yost.cd3, reduction = 'umap', group.by = 'patient', label = T, repel = T)

Projection

As this dataset is a mix between CD4 and CD8 T cells, we will keep the parameter filter.cells as TRUE to keep only CD8+ T cells.

DefaultAssay(Yost.cd3) <- "RNA"
Yost.cd3 <- ProjecTILs.classifier(Yost.cd3, ref = ref.cd8, filter.cells = T, split.by = 'patient', ncores = 6)
table(Yost.cd3$functional.cluster)
## 
##        CD8.CM        CD8.EM      CD8.MAIT CD8.NaiveLike     CD8.TEMRA 
##          3806          5296           316           438           965 
##       CD8.TEX      CD8.TPEX 
##          2370           499
DimPlot(Yost.cd3, group.by = 'functional.cluster', order = T, cols = mycols, label = T, repel = T)

We indeed detect TPEX. Let’s check globally how the expression profiles look.

# Radar plots
p <- plot.states.radar(ref.cd8, query = Yost.cd3, min.cells = 10, genes4radar = c('LEF1', "TCF7", "CCR7", "GZMK", "FGFBP2",'FCGR3A','ZNF683','ITGAE', "CRTAM", "CD200",'GNG4', "HAVCR2", "TOX", "ENTPD1", 'TYROBP','KIR2DL1'), return = T) 
wrap_plots(p) + theme_bw()

We can see that in Yost et al. dataset, CD8.TPEX can be found with profiles matching the reference. TPEX are found scattered all over the original UMAP space. This is because activation signals and other confounding factors were contributing to defining UMAP space. Reference-based annotation uncovers cell types behind the activation program. If you are interested in recovering cell types hidden by transient cell states, you can read more in the corresponding tutorial.

Session Info

sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Ventura 13.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices datasets  utils     methods   base     
## 
## other attached packages:
##  [1] plotly_4.10.1          EnhancedVolcano_1.14.0 pheatmap_1.0.12       
##  [4] UCell_2.2.0            scales_1.2.1           ggrepel_0.9.2         
##  [7] SignatuR_0.1.0         gridExtra_2.3          ProjecTILs_3.0.2      
## [10] patchwork_1.1.2        GEOquery_2.66.0        Biobase_2.58.0        
## [13] BiocGenerics_0.44.0    data.table_1.14.6      STACAS_2.0.0          
## [16] scGate_1.4.1           forcats_0.5.2          stringr_1.5.0         
## [19] dplyr_1.0.10           purrr_1.0.1            readr_2.1.3           
## [22] tidyr_1.2.1            tibble_3.1.8           ggplot2_3.4.0         
## [25] tidyverse_1.3.2        SeuratObject_4.1.3     Seurat_4.3.0          
## 
## loaded via a namespace (and not attached):
##   [1] utf8_1.2.2                  spatstat.explore_3.0-5     
##   [3] reticulate_1.27             tidyselect_1.2.0           
##   [5] htmlwidgets_1.6.1           grid_4.2.1                 
##   [7] BiocParallel_1.32.5         Rtsne_0.16                 
##   [9] munsell_0.5.0               codetools_0.2-18           
##  [11] ica_1.0-3                   umap_0.2.9.0               
##  [13] future_1.30.0               miniUI_0.1.1.1             
##  [15] withr_2.5.0                 spatstat.random_3.1-2      
##  [17] colorspace_2.1-0            progressr_0.13.0           
##  [19] highr_0.10                  knitr_1.41                 
##  [21] rstudioapi_0.14             stats4_4.2.1               
##  [23] SingleCellExperiment_1.20.0 ROCR_1.0-11                
##  [25] tensor_1.5                  listenv_0.9.0              
##  [27] labeling_0.4.2              MatrixGenerics_1.10.0      
##  [29] GenomeInfoDbData_1.2.9      polyclip_1.10-4            
##  [31] farver_2.1.1                parallelly_1.34.0          
##  [33] vctrs_0.5.2                 generics_0.1.3             
##  [35] xfun_0.36                   timechange_0.2.0           
##  [37] R6_2.5.1                    GenomeInfoDb_1.34.7        
##  [39] rmdformats_1.0.4            pals_1.7                   
##  [41] bitops_1.0-7                spatstat.utils_3.0-1       
##  [43] cachem_1.0.6                DelayedArray_0.24.0        
##  [45] assertthat_0.2.1            promises_1.2.0.1           
##  [47] googlesheets4_1.0.1         gtable_0.3.1               
##  [49] globals_0.16.2              goftest_1.2-3              
##  [51] rlang_1.0.6                 splines_4.2.1              
##  [53] lazyeval_0.2.2              gargle_1.2.1               
##  [55] dichromat_2.0-0.1           spatstat.geom_3.0-5        
##  [57] broom_1.0.2                 BiocManager_1.30.19        
##  [59] yaml_2.3.7                  reshape2_1.4.4             
##  [61] abind_1.4-5                 modelr_0.1.10              
##  [63] backports_1.4.1             httpuv_1.6.8               
##  [65] tools_4.2.1                 bookdown_0.32              
##  [67] ellipsis_0.3.2              jquerylib_0.1.4            
##  [69] RColorBrewer_1.1-3          ggridges_0.5.4             
##  [71] Rcpp_1.0.10                 plyr_1.8.8                 
##  [73] zlibbioc_1.44.0             RCurl_1.98-1.9             
##  [75] openssl_2.0.5               deldir_1.0-6               
##  [77] pbapply_1.7-0               cowplot_1.1.1              
##  [79] S4Vectors_0.36.1            zoo_1.8-11                 
##  [81] SummarizedExperiment_1.28.0 haven_2.5.1                
##  [83] cluster_2.1.4               fs_1.6.0                   
##  [85] magrittr_2.0.3              RSpectra_0.16-1            
##  [87] scattermore_0.8             lmtest_0.9-40              
##  [89] reprex_2.0.2                RANN_2.6.1                 
##  [91] googledrive_2.0.0           fitdistrplus_1.1-8         
##  [93] matrixStats_0.63.0          hms_1.1.2                  
##  [95] mime_0.12                   evaluate_0.20              
##  [97] xtable_1.8-4                readxl_1.4.1               
##  [99] IRanges_2.32.0              compiler_4.2.1             
## [101] maps_3.4.1                  KernSmooth_2.23-20         
## [103] crayon_1.5.2                htmltools_0.5.4            
## [105] later_1.3.0                 tzdb_0.3.0                 
## [107] lubridate_1.9.0             DBI_1.1.3                  
## [109] dbplyr_2.3.0                MASS_7.3-58.2              
## [111] data.tree_1.0.0             Matrix_1.5-3               
## [113] cli_3.6.0                   parallel_4.2.1             
## [115] igraph_1.3.5                GenomicRanges_1.50.2       
## [117] pkgconfig_2.0.3             sp_1.6-0                   
## [119] spatstat.sparse_3.0-0       xml2_1.3.3                 
## [121] bslib_0.4.2                 XVector_0.38.0             
## [123] rvest_1.0.3                 digest_0.6.31              
## [125] pracma_2.4.2                sctransform_0.3.5          
## [127] RcppAnnoy_0.0.20            spatstat.data_3.0-0        
## [129] rmarkdown_2.20              cellranger_1.1.0           
## [131] leiden_0.4.3                uwot_0.1.14                
## [133] shiny_1.7.4                 lifecycle_1.0.3            
## [135] nlme_3.1-161                jsonlite_1.8.4             
## [137] BiocNeighbors_1.16.0        mapproj_1.2.9              
## [139] askpass_1.1                 viridisLite_0.4.1          
## [141] limma_3.54.0                fansi_1.0.4                
## [143] pillar_1.8.1                lattice_0.20-45            
## [145] fastmap_1.1.0               httr_1.4.4                 
## [147] survival_3.5-0              glue_1.6.2                 
## [149] png_0.1-8                   stringi_1.7.12             
## [151] sass_0.4.5                  renv_0.15.5                
## [153] irlba_2.3.5.1               future.apply_1.10.0